Importance of methodological choices in data manipulation for validating epileptic seizure detection models. (arXiv:2302.10672v1 [cs.LG])
Epilepsy is a chronic neurological disorder that affects a significant
portion of the human population and imposes serious risks in the daily life of
patients. Despite advances in machine learning and IoT, small, nonstigmatizing
wearable devices for continuous monitoring and detection in outpatient
environments are not yet available. Part of the reason is the complexity of
epilepsy itself, including highly imbalanced data, multimodal nature, and very
subject-specific signatures. However, another problem is the heterogeneity of
methodological approaches in research, leading to slower progress, difficulty
comparing results, and low reproducibility. Therefore, this article identifies
a wide range of methodological decisions that must be made and reported when
training and evaluating the performance of epilepsy detection systems. We
characterize the influence of individual choices using a typical ensemble
random-forest model and the publicly available CHB-MIT database, providing a
broader picture of each decision and giving good-practice recommendations,
based on our experience, where possible.
( 2
min )